## fixed.acidity volatile.acidity citric.acid residual.sugar chlorides
## 1 7.4 0.70 0.00 1.9 0.076
## 2 7.8 0.88 0.00 2.6 0.098
## 3 7.8 0.76 0.04 2.3 0.092
## 4 11.2 0.28 0.56 1.9 0.075
## 5 7.4 0.70 0.00 1.9 0.076
## 6 7.4 0.66 0.00 1.8 0.075
## free.sulfur.dioxide total.sulfur.dioxide density pH sulphates alcohol
## 1 11 34 0.9978 3.51 0.56 9.4
## 2 25 67 0.9968 3.20 0.68 9.8
## 3 15 54 0.9970 3.26 0.65 9.8
## 4 17 60 0.9980 3.16 0.58 9.8
## 5 11 34 0.9978 3.51 0.56 9.4
## 6 13 40 0.9978 3.51 0.56 9.4
## quality
## 1 5
## 2 5
## 3 5
## 4 6
## 5 5
## 6 5
## 'data.frame': 1599 obs. of 12 variables:
## $ fixed.acidity : num 7.4 7.8 7.8 11.2 7.4 7.4 7.9 7.3 7.8 7.5 ...
## $ volatile.acidity : num 0.7 0.88 0.76 0.28 0.7 0.66 0.6 0.65 0.58 0.5 ...
## $ citric.acid : num 0 0 0.04 0.56 0 0 0.06 0 0.02 0.36 ...
## $ residual.sugar : num 1.9 2.6 2.3 1.9 1.9 1.8 1.6 1.2 2 6.1 ...
## $ chlorides : num 0.076 0.098 0.092 0.075 0.076 0.075 0.069 0.065 0.073 0.071 ...
## $ free.sulfur.dioxide : num 11 25 15 17 11 13 15 15 9 17 ...
## $ total.sulfur.dioxide: num 34 67 54 60 34 40 59 21 18 102 ...
## $ density : num 0.998 0.997 0.997 0.998 0.998 ...
## $ pH : num 3.51 3.2 3.26 3.16 3.51 3.51 3.3 3.39 3.36 3.35 ...
## $ sulphates : num 0.56 0.68 0.65 0.58 0.56 0.56 0.46 0.47 0.57 0.8 ...
## $ alcohol : num 9.4 9.8 9.8 9.8 9.4 9.4 9.4 10 9.5 10.5 ...
## $ quality : int 5 5 5 6 5 5 5 7 7 5 ...
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.000 5.000 6.000 5.636 6.000 8.000
##
## 3 4 5 6 7 8
## 10 53 681 638 199 18
In this analysis we look into the effect of different chemical properties on the wine quality. This dataset has 12 variables, the quality and 11 chemical properties of the wine (fixed.acidity, volatile.acidity, citric.acid, residual.sugar, chlorides, free.sulfur.dioxide, total.sulfur.dioxide, density, pH, sulphates, and alcohol). This dataset has 1599 observations.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.000 5.000 6.000 5.636 6.000 8.000
The quality of the wines run from 3 to 8 with most of the wines are either a 5 or a 6. The quality is a discrete value. The mean is 5.6.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.60 7.10 7.90 8.32 9.20 15.90
The fixed acidity is a right tailed distribution. The median is at 7.9 g/dm^3. Due to the outliers on the right the mean is pulled to 8.3 g/dm^3.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.1200 0.3900 0.5200 0.5278 0.6400 1.5800
The volatile acidity is a normal distribution with a few outliers to the right. The mean and median are pretty close together at 0.5200 and 0.5278 g/dm^3 respectively.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 0.090 0.260 0.271 0.420 1.000
The citric acid distribution is weird. It looks like two right skewed distribution on top of oneother.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.900 1.900 2.200 2.539 2.600 15.500
The residual sugar distribution has a sharp peak at around 2.2 g/dm^3. Although the tail is long the peak is so far above the rest that the mean is only pulled a little bit to the right of the median. The boxplot is also rather flat as the amount of wines with a residual sugar around 2.2 is huge compared to the rest.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.01200 0.07000 0.07900 0.08747 0.09000 0.61100
The chlorides distribution is also sharply peaked. This time the peak is around 0.079 g/dm^3.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 7.00 14.00 15.87 21.00 72.00
The free sulfur distribution has a tail on the right. The median and mean are 14.00 and 15.87 mg/dm^3 respectively.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 6.00 22.00 38.00 46.47 62.00 289.00
The total sulur distribution is a right tailed distribution with some outlier very far to the right. The median and the mean are 38.00 and 46.47 mg/dm^3 respectively with a max at 289 mg/dm^3!
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.9901 0.9956 0.9968 0.9967 0.9978 1.0037
The density is a normal distributin with a mean of 0.9967 g/cm^3.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.740 3.210 3.310 3.311 3.400 4.010
The ph distribution is a normal distribution with a mean at 3.311
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.3300 0.5500 0.6200 0.6581 0.7300 2.0000
The sulfates distribution is a right skewed distributionwith some outliers far to the right.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.40 9.50 10.20 10.42 11.10 14.90
The alcohol percentage distribution is right skewd with a median of 10.20% and a mean of 10.42%.
The red wine dataset had originally 1599 rows and 13 columns. I deleted the column X as it basically is the same as the rownumber. So I had 12 columns left. The columns are the quality and 11 chemical properties of the wine ( fixed.acidity, volatile.acidity, citric.acid, residual.sugar, chlorides, free.sulfur.dioxide, total.sulfur.dioxide, density, pH, sulphates, alcohol, and quality). The quality is a discrete with values: 4, 5, 6, 7, and 8 the rest is continuous.
The most important feature of the dataset is quality. I am interested how the quality is affected by the other properties of the wine.
At this point we have not yet analized the data. As I don’t drink alcohol and have never tasted wine I am guessing here. As people in general seem to like sugar and alcohol I would expect that those properties have a positive impact on the quality. I would expect sulfur, sulfates and chlorides to have a negative effect.
No. I did delete the column X as the sample number ad the rowcount are the same.
The citric acid distribution is very unusual. It looks like two right skewed distribution on top of oneother. Other than removing the column X as described above I did not change the data.
I am interested in the chemical properties that have an effect on the quality of the wine. I intend to plot the quality with all the available properties but first let us make a correlation table to get a first idea of which properties have the strongest effect on the quality of the wine.
## [1] "Correlation of wine quality with different properties"
## fixed.acidity volatile.acidity citric.acid
## 0.12405165 -0.39055778 0.22637251
## residual.sugar chlorides free.sulfur.dioxide
## 0.01373164 -0.12890656 -0.05065606
## total.sulfur.dioxide density pH
## -0.18510029 -0.17491923 -0.05773139
## sulphates alcohol
## 0.25139708 0.47616632
We see that alcohol, sulphates, citric acidity, and fixed acidity have a positive correlation with the wine quality. Volatile acidity, chlorides, total sulfur dioxide, and density have a negative correlation. The rest doesn’t seem to do much.
In this plot we see that fixed acidity has almost no effect on the wine quality.
This plot clearly shows that increasing the volitile acidity degrades the wine quality.
Increasing the citric acidity increases the quality of the wine.
Residual sugar seems to have no effect on the wine quality.
Chlorides have a negative impact on the wine quality.
Free sulfur dioxide is mostly present in the wines of average quality. Both the good wines as the terrible wines have a lower free sulfur dioxide concentration.
Total sulfur dioxide is mostly present in the wines of average quality. Both the good wines as the terrible wines have a lower total sulfur dioxide concentration.
Increasing the density loweres the quality of the wine.
Lowering the pH has a positive impact on the quality of the wine.
Increasing sulphates concentration increases the quality of the wine.
Increasing the alcohol concentration increases the wine quality.
We can see that increasing the alcohol percentage loweres the density. This is not surpricing as alcohol has a lower density than water. This can explain the increase in wine quality as the density is lowered. The other properties that had an effect on wine quality did not have a clear relation with alcohol percentage.
We see that increasing the citric acidity concentration loweres the pH value. Not surpricing.
Increasing the fixed acidity lowers the pH.
Volatile acidity has not much effect on the pH.
## cor
## 0.6676665
The correlation between total sulfur dioxide and free sulfur dioxide is 0.668 the strongest I found in this dataset.
The strongest positive correlation with quality is the alcohol percentage. Further sulphates and citric acidity had positive effects on the wine quality. The wine quality was negatively influenced by volatile aidity, chlorides, and density. The influence of density could be explained by the negative correlation with alcohol. Contrary to my expectation residual sugar levels had little to no effect on the wine quality.
Fixed acidity and citric acidity had a strong negative effect on the pH while volatile acidity had no effect on the pH. Yet Fixed acidity had no effect on the quality of the wine. So apparently the quality of the wine is not determined by the acidity of the wine but is more influenced by the precence of citric acid which has a positive effect on quality and volatile acidity which has a negative effect.
The relation between total sulfur dioxide and free sulfur dioxide was the strongest relation found with a correlation coefficient of 0,668.
Higher quality wines are produced with a higher alcohol content and a higher sulphates concentration.
In the bivariate section that increasing the density loweres the quality of the wine. In this plot we see that this is mostly due to the fact that a lower density means a higher alcohol concentration (alcohol is lighter than water). The better quality wine is caused by the higher alcohol percentage not the lower density.
Higher volatile acidity loweres the wine quality. The pH itself has little effect.
More citric acid results in better wines. Again the influence of pH is small at best.
This plot shows the combined effect of higher citric acid and lower volatile acidity. With the exception of a few outliers the best wines are at the bottom right of this plot.
This plot shows that good wines are produced by higher alcohol and lower volatile acidity. It also shows that the effect of alcohol is larger than the effect of volatile accidity. Below an alcohol percentage of 10% it is very hard to produce a good wine.
The best wines are produced by a high alcohol percentage and a larger citric acid concentration. Again the effect of alcohol is the strongest.
The best wines are produced by highe alcohol percentage, a higher sulphates concentration (0.8 - 1.1), a higher citric acid concentration and a low volatile acidity.
Residual sugar had almost no effect on the wine quality. Wine quality was not much influenced by the pH itself, but more on the presence of citric acid and absense of volatile acid.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.000 5.000 6.000 5.636 6.000 8.000
This report is abput the wine quality so the first plot I have choosen is the wine quality distribution. The distribution apears to be normal with most wines having a quality of 5 or 6. The mean is 5.6. The scale runs from 0 to 10 but no wines with a quality below 3 or above 8 were found in this dataset.
The property with the largest effect on wine quality is alcohol percentage. A higher alcohol percentage gives better wines. However as the overlap in the boxplot show alcohol percentage alone is not enough to garantee a good wine.
This data set contains information on 1599 red wines with twelve variables. The quality and eleven chemical properties. The quality is discrete and the rest is continuous. The quality is a scale from 0 to 10 but no wines with a quality below 3 or above 8 were found in this dataset. This study could be improved by gathering more data from wines with very low or very high quality scores.
Wine quality improved by increasing the alcohol percentage and the citric acid concentration. Increasing the volatile acidity concentration degrades the wine quality.
I was suprised to find that the risidual sugar concentration had almost no influence on the wine quality. Clearly not everything gets better from adding sugar!
I was also suprised to find that increasing the sulfates concentration from 0.5 to 0.8 g/dm^3 improved the quality. Adding more sulfates did not seem to help.